Enabling index scans for OID cast patterns in system views#4868
Conversation
|
/code-review |
kuntalghosh
left a comment
There was a problem hiding this comment.
AI Code Review
Verdict: COMMENT
Based on the provided code changes, here are my feedback and suggestions:
Issue 1 - In contrib/babelfishpg_tsql/src/hooks.c at line 59: #include ordering:
#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "access/stratnum.h"
#include "parser/analyze.h"The #include "access/stratnum.h" is placed after the optimizer/ includes, breaking alphabetical ordering by directory prefix. The project convention groups includes alphabetically by path.
Suggested fix:
#include "access/stratnum.h"
...
#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"Issue 2 - In contrib/babelfishpg_tsql/src/pltsql.h at lines 2523–2541: Unrelated InsertExecContext declarations bundled with an index-scan hook PR:
typedef struct InsertExecContext
{
Oid temp_table_oid;
Oid target_rel_oid;
bool is_target_relation_modified;
} InsertExecContext;
extern InsertExecContext insert_exec_ctx;
extern Oid create_insert_exec_temp_table(...);
extern void flush_insert_exec_temp_table(...);
extern DestReceiver *CreateInsertExecDestReceiver(void);These declarations expose an INSERT...EXEC API surface that has no corresponding implementation in this PR. No .c file defines these symbols, so they are dead code at this point. Including them here creates a confusing scope for review — the PR title and description focus entirely on the OID cast index-scan optimization (BABEL-6814), yet these declarations belong to a different feature (INSERT...EXEC support).
This is not a blocking issue since the declarations are inert (no linker impact when unreferenced), but it would be cleaner to:
- Land them in the PR that introduces the implementation, or
- Guard them behind a comment noting the follow-up PR that will use them.
The security review noted that the implementation PR must be carefully reviewed for state-reset on error paths, TOCTOU on target_rel_oid, and authorization checks in CreateInsertExecDestReceiver.
Issue 3 - In test/JDBC/input/BABEL-6814-vu-prepare.sql: Test setup creates 100 tables and 500 schemas for plan stability:
WHILE @i <= 100
BEGIN
SET @sql = 'CREATE TABLE babel_6814_extra_' + CONVERT(VARCHAR, @i) + ' (id INT)';
...
END
...
WHILE @j <= 500
BEGIN
SET @sql2 = 'CREATE SCHEMA babel_6814_schema_' + CONVERT(VARCHAR, @j);
...
ENDCreating 100 extra tables + 100 indexes + 500 schemas to force planner cost estimates toward index scans is a heavy-weight test fixture. This significantly increases test runtime (especially in upgrade tests where it runs against every version). Consider whether a smaller cardinality (e.g., 20 tables, 50 schemas) would be sufficient to tip the planner, or whether SET enable_seqscan = off in the psql EXPLAIN section would produce deterministic plans without the bulk object creation.
Issue 4 - Advisory: EXPLAIN plan tests may be fragile across PostgreSQL major versions:
The EXPLAIN (COSTS OFF) tests (Tests 16–22) assert specific plan shapes (e.g., "Index Scan using pg_class_oid_index"). While the hook guarantees the index-scannable rewrite is available, the planner's final plan choice depends on table statistics, catalog size, and cost parameters. The -- parallel_query_expected variant already shows different plan wrappers (Gather nodes). If PostgreSQL's cost model changes across major versions, these tests could produce spurious failures. The existing mitigation (bulk object creation) helps but is not guaranteed for all future PG versions.
Consider adding a comment in the test file noting that these EXPLAIN tests may need updating when PostgreSQL's cost model or parallel query thresholds change.
Missing Tests
No critical gaps. The test coverage is thorough:
- Both operand orderings (left/right cast patterns)
- Correlated subqueries, joins, IN-list subqueries
- Multiple catalog tables (pg_class, pg_index, pg_attribute, pg_namespace)
- Non-existent OID (zero-row case)
- Parallel query variant
- EXPLAIN plan verification
One minor gap: no test for the contain_volatile_functions guard — i.e., a query like WHERE (oid)::int4 = nextval('seq') that should NOT be rewritten to use an index scan. This is a correctness edge case rather than a functional gap since a volatile expression would still produce correct results (just without the optimization).
Notes
- The PR description references BABEL-6435 (OBJECTPROPERTYEX C rewrite) but this PR only implements BABEL-6814 (OID cast index scans). The description content about OBJECTPROPERTYEX appears to be carried over from the cherry-pick source PR #4829 and is somewhat misleading for this standalone change. The actual code change here is focused and well-scoped.
- Security review (0 actionable findings): The planner hook is semantically correct —
int4andoidshare 32-bit representation,RelabelTypeis binary-identity,oideqis leakproof, andsecurity_levelis properly propagated. The hook correctly chains withprev_match_opclause_to_indexcol_hookand restores it on uninstall. - The
InsertExecContextdeclarations inpltsql.hshould receive a dedicated security review when their implementation lands. - Upgrade schedule files correctly add
BABEL-6814to all version schedules, ensuring the test runs during minor version upgrades.
Note: This AI review is meant to augment, not replace, human review. Use the suggestions with discretion.
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
9a738bc to
10be049
Compare
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
|
/code-review |
2 similar comments
|
/code-review |
|
/code-review |
Adding a hook in match_opclause_to_indexcol() that allows babelfish to provide index clauses when the built-in logic cannot match an OpExpr to an index column. This enables extensions to handle type-cast patterns that prevent index usage due to operator family mismatches. Extension PR: babelfish-for-postgresql/babelfish_extensions#4868 Task: BABEL-6814 Authored-by: Rucha Kulkarni ruchask@amazon.com
kuntalghosh
left a comment
There was a problem hiding this comment.
AI Code Review
Verdict: COMMENT
Based on the provided code changes, here are my feedback and suggestions:
Issue 1 - In contrib/babelfishpg_tsql/src/hooks.c at line 59: #include ordering:
#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "access/stratnum.h"
#include "parser/analyze.h"The #include "access/stratnum.h" is placed after the optimizer/ group, breaking the directory-prefix grouping convention used throughout this file. While the file already has some ordering inconsistencies, new additions should follow the convention.
Suggested fix:
#include "access/stratnum.h"
...
#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"Issue 2 - In contrib/babelfishpg_tsql/src/pltsql.h at lines 2523–2541: Unrelated InsertExecContext declarations in an index-scan optimization PR:
typedef struct InsertExecContext
{
Oid temp_table_oid;
Oid target_rel_oid;
bool is_target_relation_modified;
} InsertExecContext;
extern InsertExecContext insert_exec_ctx;
extern Oid create_insert_exec_temp_table(...);
extern void flush_insert_exec_temp_table(...);
extern DestReceiver *CreateInsertExecDestReceiver(void);These declarations expose an INSERT...EXEC API surface that has no corresponding implementation in this PR — no .c file defines these symbols. Including dead forward declarations in a PR focused on OID cast index-scan optimization (BABEL-6814) confuses the review scope. They are inert (no linker impact when unreferenced), but would be cleaner landing in the PR that introduces their implementation, or guarded by a comment noting the follow-up PR.
The security review confirmed these are harmless in isolation, but flagged that the future implementation PR must be carefully reviewed for: SQL injection in create_insert_exec_temp_table (all const char * params imply dynamic DDL), stale OID exposure from the global insert_exec_ctx on transaction abort, and DestReceiver lifecycle on error paths.
Issue 3 - In test/JDBC/input/BABEL-6814-vu-prepare.sql: Heavy test fixture (100 tables + 500 schemas) for plan stability:
WHILE @i <= 100
BEGIN
SET @sql = 'CREATE TABLE babel_6814_extra_' + CONVERT(VARCHAR, @i) + ' (id INT)';
...
END
...
WHILE @j <= 500
BEGIN
SET @sql2 = 'CREATE SCHEMA babel_6814_schema_' + CONVERT(VARCHAR, @j);
...
ENDCreating 100 extra tables, 100 indexes, and 500 schemas to force planner cost estimates toward index scans is a heavyweight fixture. This adds significant runtime to every upgrade test version (60+ schedules now include BABEL-6814). Consider whether a smaller cardinality (e.g., 20 tables, 50 schemas) would be sufficient to tip the planner, or whether SET enable_seqscan = off in the psql EXPLAIN section would produce deterministic plans without the bulk object creation.
Issue 4 - Advisory: EXPLAIN plan tests may be fragile across PostgreSQL major versions:
The EXPLAIN (COSTS OFF) tests (Tests 16–22) assert specific plan shapes like "Index Scan using pg_class_oid_index". While the hook guarantees the rewrite is available, the planner's final choice depends on table statistics and cost parameters. The -- parallel_query_expected variant already shows different plan wrappers (Gather nodes). If PostgreSQL's cost model changes across major versions, these tests could produce spurious failures.
Consider adding a brief comment in the test file noting these EXPLAIN tests may need updating when PostgreSQL's cost model or parallel query thresholds change.
Missing Tests
No critical gaps. The test coverage is thorough:
- Both operand orderings (left/right cast patterns)
- Correlated subqueries, joins, IN-list subqueries
- Multiple catalog tables (pg_class, pg_index, pg_attribute, pg_namespace)
- Non-existent OID (zero-row case)
- Parallel query variant
- EXPLAIN plan verification
One minor gap: no test for the contain_volatile_functions guard — i.e., a query like WHERE (oid)::int4 = nextval('seq') that should NOT be rewritten. This is a correctness edge case rather than a functional gap since a volatile expression would still produce correct results (just without the optimization).
Notes
- Prior review issues addressed: The previous AI review (DISMISSED) raised the same Issues 1–4 above. They remain advisory and have not been addressed in subsequent commits. The human reviewer (
tanscorpio7) has approved the PR. - Security review (0 findings): The planner hook is semantics-preserving —
int4andoidshare 32-bit representation,RelabelTypeis binary-identity,oideqis leakproof, andsecurity_levelis properly propagated. The hook correctly chains withprev_match_opclause_to_indexcol_hookand restores it on uninstall. No volatile functions bypass, no pointer aliasing, no resource leaks. - The
InsertExecContextdeclarations should receive a dedicated security review when their implementation lands. - The
1_GRANT_SCHEMA-vu-verify.mixchange addsANALYZEon system catalogs at the start of the test to stabilize plan choices — this is reasonable and helps prevent test flakiness. - Upgrade schedule files correctly add
BABEL-6814to all version schedules, ensuring the test runs during minor version upgrades. - Cherry-pick context: This PR is cherry-picked from PR #4829 which targets BABEL_6_X_DEV. The target branch
BABEL_5_X_DEVis correct for the 5.x backport.
Note: This AI review is meant to augment, not replace, human review. Use the suggestions with discretion.
dbb532c
into
babelfish-for-postgresql:BABEL_5_X_DEV
Description
Implementing match_opclause_to_indexcol_hook to rewrite expressions like (oid)::integer = value to oid = value::oid, enabling index usage in system catalog views.
Following improvement is observed in SSMS Queries (36k objects):
Cherry-picked from: #4829
Engine PR: babelfish-for-postgresql/postgresql_modified_for_babelfish#774
Issues Resolved
Task: BABEL-6814
Authored-by: Rucha Kulkarni ruchask@amazon.com
Signed-off-by: Rucha Kulkarni ruchask@amazon.com
Engine PR: babelfish-for-postgresql/postgresql_modified_for_babelfish#762
Test Scenarios Covered
Since this is a planner only improvements, we have not added any tests. There are existing tests which check correctness.
Use case based -
Boundary conditions -
Arbitrary inputs -
Negative test cases -
Minor version upgrade tests -
Major version upgrade tests -
Performance tests -
Tooling impact -
Client tests -
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.